Learning Objectives

After completing this lesson, you’ll be able to:

Introduction

With the SchemaScanner, you can easily extract and manipulate the schema of your datasets, tackling dynamic workspace issues such as schema standardization and schema drift. How does it work? The SchemaScanner gives you a list attribute with attribute names and data types. Downstream in your workspace, you can use this list attribute to manipulate your schema and create flexible workflows. Instead of defining a fixed schema on your writers, you can use the schema from the list attribute to flexibly define the schema at runtime. Quality assurance and schema drift handling just got easier!

What is a Schema?

A schema, sometimes referred to as the "data model," can be described as the structure of a dataset or, more accurately, a formal definition of a dataset’s structure.

Each dataset has its unique schema, which includes feature types, permitted geometries, user-defined attributes, and other rules that define or restrict its content. However, for most users, the most important aspect of schema is attribute names and data types.

Using the SchemaScanner

The SchemaScanner processes features and retrieves their schema by scanning for the attribute name and its data type. It will either scan all features or just a specified number of them. There’s also the option to exclude attributes using a Regular Expression to ensure a clean schema output.

The resulting output is a new schema feature output via the <Schema> output port. This new feature is also given the special attribute and value: fme_schema_handling = ‘schema_only’, which allows the feature to be recognized by a dynamic writer as a schema feature. If you wish to continue using the original input features, these are passed via the Output port. 

Note

For more technical information on the SchemaScanner, check out the documentation.

Why you might want to use the SchemaScanner in your workspace:

Key things to remember: